Our project goal is to examine and understand the impact of COVID-19 on integral aspects of the U.S. economy such as GDP and Unemployment Rates; we chose these parameters as they are often used by economists to measure the current state of an economy.
We have first gathered COVID-19 health data recorded daily by the U.S. Center for Disease Control to to understand which months of the pandemic faced the highest number of new cases and deaths. We utilized R to wrangle this data into monthly, national data and created bar plots to help us visualize the peaks and troughs in the pandemic timeline beginning with Jan ‘20 through Mar ‘21.
To study unemployment rates, we gathered Labor Force Statistics from the Current Population Survey Dataset from the U.S. Bureau of Labor Statistics. This dataset helped us understand the shifts in unemployment rates and the 1-monthly change in rates from Jan ‘20 to Mar ‘21. We mapped a timeline of the increase/decrease in cases and deaths along with the change in unemployment rates in order to infer a correlation. In creating these line charts, we calculated new cases and deaths as a proportional of the total (from Jan ‘20 to Mar ‘21) in order to place all three variables (unemployment rates, new cases, and deaths) on one axis.
In order to further understand the economic impacts in specific regions in the U.S., we have focused our GDP analysis on four states with contrasting population sizes and responses to the pandemic. We specifically examined California (strong response to the pandemic, large population size), Texas (relaxed response to the pandemic, large population size), Rhode Island (strong response to the pandemic, small population size), and Utah (relaxed response to the pandemic, small population size). We also explored the GDP changes between various industries such as art, health, technical professions, real estate, and food to better understand which contributors to state GDP were impacted the most. In our analysis of GDP, we created an interactive map with data from 2019 and 2020 in order to compare the changes in GDP from before the pandemic began. We additionally created barplots that displayed the change in GDP from 2017 to 2020 for the four chosen states.
It has been saddening to study this data, quantifying the rise in cases and deaths and amount of individuals who lost their employment or seeing the downward slope of GDP, however, we hope our findings will help inform how government responses to the pandemic impacted local and state economies and the health of residents in the economy.
#Economy VisualizationThe global pandemic this past year has impacted many aspects of our normalcy and daily lives. We utilized the Twitter Developer API to better understand how people across the world were thinking about the current state of our economy, by gathering the 600 most recent tweets published with #economy and creating a word cloud to visualize our findings. The top words in the word cloud are “covid”, “business”, followed by “jobs”, “government”, and “globalhealth” to name a few.
We predict a linear correlation between unemployment rates and number of cases. As the number of COVID-19 cases increased, local and state governments entered lockdowns and further quarantine restrictions which impacted business and services, leading to higher levels of unemployment. In contrast, we hypothesize that an increasing number of deaths may not have had an equivalently strong correlation with unemployment rates. There are a few reasons for this: (1) cases spread and are recorded more quickly, (2) the number of cases is drastically higher than number of deaths, (3) there seemed to be a long-term misconception that only the elderly and immunocompromised (who might represent a smaller subset of the general population) were at high risk for severe loss due to the virus.
In terms of GDP per State, we have chosen California, Texas, Rhode Island, and Utah to examine in detail. We predict that California might have experienced an increase in GDP, due to the number of technology companies residing in the State of California who reported record amounts of sales during the shift to a remote, virtual world. We briefly researched this topic with the California State Assembly. California and Texas each have large populations and were at the top of the charts at distinct points of time for cases/deaths, but followed drastically contrasting responses to the pandemic in terms of lockdowns, mask mandates, etc. We hypothesize that California would have seen an increase or at least continuation of GDP from previous years whereas Texas might have experienced a decline. Rhode Island and Utah are closely related in population amounts, however, Utah was the one of the only states to not issue a lockdown order. Specific counties in Utah with larger populations or tourist sites did issue lockdown orders, whereas Rhode Island’s governor issued a stay at home order in late March itself. We briefly researched stay at home orders through the New York Times. We predict that Rhode Island will experience a steady GDP in 2020 whereas Utah might experience a decline due to tourist locations and highly populated areas (with the highest job density) were the only ones experiencing lockdowns.
| variable | description |
|---|---|
U.S. COVID-19 Cases |
Number of COVID-19 positive cases recorded each day in each state; this data has been wrangled into monthly national data from January 2020 to March 2021. |
U.S. COVID-19 Deaths |
Number of deaths from COVID-19 recorded each day in each state; this data has been wrangled into monthly national data from January 2020 to March 2021. |
U.S. Unemployment Rate per Month |
National recorded unemployment rates on a monthly basis from January 2020 to March 2021. |
U.S. GDP Percent Change per Year |
National monthly change in unemployment rates on a monthly basis from January 2020 to March 2021. |
Industry |
Arts, Health, Food, Eeal Estate, and Technical. |
States |
California, Texas, Utah, and Rhode Island. |
# load twitter library, and all other libraries
library(rtweet)
library(ggplot2)
library(dplyr)
library(tidytext)
# name of created app
appname <- "economic-impact-of-covid19"
## api key
key <- "LDVYl0gTvhvYbNnsiJd3IrTMk"
## api secret
secret <- "HnBymm96BOg1SYZKxvgURKc7ULJ8MKE0Fnj0ZjFfWqey7e34FQ"
# create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = key,
consumer_secret = secret,
access_token = '2620473211-WWfgOYF0UiZM0yLw3K0T3WxlG5a2cyH5heHimhC',
access_secret = 'Z6LWeugQ0mi7ANivINfUvWndSkT0ic3gceF0YQn2MC4jY')
data <- search_tweets("#economy", n = 600, include_rts = FALSE)
head(data$text)
## [1] "@sam6con @ShaunBaileyUK @PeterTFortune @cwowomen @CYoungWomen @CwoSouthern @CWOwestmidlands @LdnConservative @scottorywomen @CWONorthWest @LondonCWO The #economy was in a worse state before the pandemic than when the @Conservatives took power from @UKLabour \n\nHomelessness poverty deprivation malnutrition and Foodbanks have increased massively over the last decade\n\n#ToryBrexitDisaster #ToryLies \n#sleaze \n#DontVoteConservative"
## [2] "@JoePorterUK @Conservatives @Young_Tories @Women2Win @ConservativesWM @amandamilling @BorisJohnson @Walking2Win @jaberley @StaffMoorTories The #economy was in a worse state before the pandemic than when the @Conservatives took power from @UKLabour \n\nHomelessness poverty deprivation malnutrition and Foodbanks have increased massively over the last decade\n\n#ToryBrexitDisaster #ToryLies \n#sleaze \n#DontVoteConservative"
## [3] "@MajorityRules2 The #economy was in a worse state before the pandemic than when the @Conservatives took power from @UKLabour \nHomelessness poverty deprivation malnutrition and Foodbanks have increased massively over the last decade #ToryBrexitDisaster #ToryLies \n\n#DontVoteConservative"
## [4] "@geoffreyMyers1 @HonestCitizen62 The economy was in a worse state before the pandemic than when the @Conservatives took power from @UKLabour \nHomelessness poverty deprivation malnutrition and Foodbanks have increased massively over the last decade \n#economy #ToryBrexitDisaster #ToryLies \n\n#DontVoteConservative"
## [5] "A Deeper Look at the Chinese Economy, From Mao to Xi Jinping and Donald Trump, by Dr. Antonio Graceffo, available on Amazon\n#China #Economy #OBOR #BRI #Beltandroad #CPC #tradewar\n\nhttps://t.co/7iFUhyVVFX"
## [6] "Someone needs to ask Melinda Gates if SHE will \npay for the chip so we won't have to carry a #VaccinePassport.\n\n They can just scan us like a lost dog as we go through the turnstiles.\n\n#Economy #reopening #GOP #EventProfs #Florida https://t.co/ad4sERKur0"
# Load libraries
library(wordcloud)
library(RColorBrewer)
library(wordcloud2)
library(tm)
# Create a vector containing only the text
text <- data$text
# Create a corpus
docs <- Corpus(VectorSource(text))
# Clean data
docs <- docs %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
docs <- tm_map(docs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs, content_transformer(tolower)):
## transformation drops documents
docs <- tm_map(docs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, stopwords("english")):
## transformation drops documents
dtm <- TermDocumentMatrix(docs)
matrix <- as.matrix(dtm)
words <- sort(rowSums(matrix),decreasing=TRUE)
df <- data.frame(word = names(words),freq=words)
# Generate the word cloud
set.seed(1234)
wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))
Let’s take a look at the total counts of new COVID-19 cases and deaths recorded each month in the U.S. This data has been reported by the U.S. Center for Disease Control in their data set: “United States COVID-19 Cases and Deaths by State over TimeCase Surveillance”. We have cleaned and organized daily COVID-19 reported data by state into national data by month.
## months dates newcases deaths
## 1 JAN20 2020-01-01 7 0
## 2 FEB20 2020-02-01 32 0
## 3 MAR20 2020-03-01 188138 3680
## 4 APR20 2020-04-01 875947 55007
## 5 MAY20 2020-05-01 725178 42053
## 6 JUN20 2020-06-01 847821 21634
## 7 JUL20 2020-07-01 1924412 28249
## 8 AUG20 2020-08-01 1470219 29184
## 9 SEPT20 2020-09-01 1217378 22327
## 10 OCT20 2020-10-01 1928056 24012
## 11 NOV20 2020-11-01 4400344 39397
## 12 DEC20 2020-12-01 6395806 79189
## 13 JAN21 2020-01-01 6098794 97095
## 14 FEB21 2020-02-01 2354530 63431
## 15 MAR21 2020-03-01 1773722 33041
Now, let’s look at how counts of new cases per month arose and fell from Jan ’20 to Mar ’21:
This bar plot helps us visually time line the spread COVID-19. The red bars denote the top six months which experienced the highest number of recorded cases, often referred to as peaks or “second, third” waves. We can infer that July ’20, Oct ’20, Nov ’20, Dec ’20, Jan ’21, and Feb ’21 had the highest number of recorded cases; from this data, we might predict that unemployment rates would be higher during these months and GDP would be lower.
Now, let’s look at how counts of new deaths due to COVID-19 per month arose and fell from Jan ’20 to Mar ’21:
This bar plot reports the months which recorded the highest number of deaths due to COVID-19; the bars in red signify the top five months with the highest amount of deaths. These months include: Apr ’20, May ’20, Dec ’20, Jan ’21, and Feb ’21. In correlation with the highest number of cases recorded, the months of Dec ’20 through Feb ’21 had both the highest number of cases and deaths. It is surprising that April and May of 2020 had two of the highest number of deaths although they didn’t have a peak in cases. However, Apr/May were early months of the pandemic where we had little knowledge about the virus, and their average case count is around 800K which might have propelled the peak in July of 1.9 million cases.
library(magrittr)
library(dplyr)
library(ggplot2)
library(plotly)
library(lubridate)
library(grid)
library(gridExtra)
library("tidyverse")
ur.df <- read.csv("/Users/dishasrivastava/Documents/GitHub/economic-impact-of-covid19/Datasets/Unemployment-Monthly.csv")
covid19.health <- read.csv("/Users/dishasrivastava/Documents/GitHub/economic-impact-of-covid19/Datasets/United_States_COVID-19_Cases_and_Deaths_by_State_over_Time.csv")
# Dropping unused columns
ur.cleaned.df <- subset(ur.df, select = -c(Series.ID, Year, Period))
# Renaming columns
colnames(ur.cleaned.df) <- c("Time_Period", "Unemployment_Rate", "Monthly_Change")
#Result
ur.cleaned.df
## Time_Period Unemployment_Rate Monthly_Change
## 1 2020 Jan 3.5 -2.8
## 2 2020 Feb 3.5 0.0
## 3 2020 Mar 4.4 25.7
## 4 2020 Apr 14.8 236.4
## 5 2020 May 13.3 -10.1
## 6 2020 Jun 11.1 -16.5
## 7 2020 Jul 10.2 -8.1
## 8 2020 Aug 8.4 -17.6
## 9 2020 Sep 7.8 -7.1
## 10 2020 Oct 6.9 -11.5
## 11 2020 Nov 6.7 -2.9
## 12 2020 Dec 6.7 0.0
## 13 2021 Jan 6.3 -6.0
## 14 2021 Feb 6.2 -1.6
## 15 2021 Mar 6.0 -3.2
# Dropping unused columns
covh.cleaned <- subset(covid19.health, select = c(submission_date, state, tot_cases, new_case, tot_death, new_death))
# Renaming columns
colnames(covh.cleaned) <- c("Date_Reported", "State", "Total_Cases", "New_Cases", "Total_Death", "New_Death")
# https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
covh.cleaned$Date_Reported <- mdy(covh.cleaned$Date_Reported) # returns year/month/day
# Result
head(covh.cleaned$Date_Reported)
## [1] "2021-04-01" "2020-10-15" "2021-03-16" "2021-04-16" "2020-02-14"
## [6] "2020-08-08"
# https://blog.exploratory.io/filter-with-date-function-ce8e84be680
# JANUARY 2020
jan20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported <= as.Date("2020-01-31")))
jan20.newcases <- sum(jan20.covid.df$New_Cases)
jan20.deaths <- sum(jan20.covid.df$New_Death)
# FEBRUARY 2020
feb20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-02-01") & Date_Reported <= as.Date("2020-02-28")))
feb20.newcases <- sum(feb20.covid.df$New_Cases)
feb20.deaths <- sum(feb20.covid.df$New_Death)
# MARCH 2020
mar20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-03-01") & Date_Reported <= as.Date("2020-03-31")))
mar20.newcases <- sum(mar20.covid.df$New_Cases)
mar20.deaths <- sum(mar20.covid.df$New_Death)
# APRIL 2020
apr20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-04-01") & Date_Reported <= as.Date("2020-04-30")))
apr20.newcases <- sum(apr20.covid.df$New_Cases)
apr20.deaths <- sum(apr20.covid.df$New_Death)
# MAY 2020
may20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-05-01") & Date_Reported <= as.Date("2020-05-31")))
may20.newcases <- sum(may20.covid.df$New_Cases)
may20.deaths <- sum(may20.covid.df$New_Death)
# JUNE 2020
jun20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-06-01") & Date_Reported <= as.Date("2020-06-30")))
jun20.newcases <- sum(jun20.covid.df$New_Cases)
jun20.deaths <- sum(jun20.covid.df$New_Death)
# JULY 2020
jul20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-07-01") & Date_Reported <= as.Date("2020-07-31")))
jul20.newcases <- sum(jul20.covid.df$New_Cases)
jul20.deaths <- sum(jul20.covid.df$New_Death)
# AUGUST 2020
aug20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-08-01") & Date_Reported <= as.Date("2020-08-31")))
aug20.newcases <- sum(aug20.covid.df$New_Cases)
aug20.deaths <- sum(aug20.covid.df$New_Death)
# SEPTEMBER 2020
sep20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-09-01") & Date_Reported <= as.Date("2020-09-30")))
sep20.newcases <- sum(sep20.covid.df$New_Cases)
sep20.deaths <- sum(sep20.covid.df$New_Death)
# OCTOBER 2020
oct20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-10-01") & Date_Reported <= as.Date("2020-10-31")))
oct20.newcases <- sum(oct20.covid.df$New_Cases)
oct20.deaths <- sum(oct20.covid.df$New_Death)
# NOVEMBER 2020
nov20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-11-01") & Date_Reported <= as.Date("2020-11-30")))
nov20.newcases <- sum(nov20.covid.df$New_Cases)
nov20.deaths <- sum(nov20.covid.df$New_Death)
# DECEMBER 2020
dec20.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2020-12-01") & Date_Reported <= as.Date("2020-12-31")))
dec20.newcases <- sum(dec20.covid.df$New_Cases)
dec20.deaths <- sum(dec20.covid.df$New_Death)
# JANUARY 2021
jan21.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2021-01-01") & Date_Reported <= as.Date("2021-01-31")))
jan21.newcases <- sum(jan21.covid.df$New_Cases)
jan21.deaths <- sum(jan21.covid.df$New_Death)
# FEBRUARY 2021
feb21.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2021-02-01") & Date_Reported <= as.Date("2021-02-28")))
feb21.newcases <- sum(feb21.covid.df$New_Cases)
feb21.deaths <- sum(feb21.covid.df$New_Death)
# MARCH 2021
mar21.covid.df <- (covh.cleaned %>%
select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
filter(Date_Reported >= as.Date("2021-03-01") & Date_Reported <= as.Date("2021-03-31")))
mar21.newcases <- sum(mar21.covid.df$New_Cases)
mar21.deaths <- sum(mar21.covid.df$New_Death)
# Creating a variable for months
months <- c("JAN20", "FEB20", "MAR20", "APR20", "MAY20", "JUN20", "JUL20", "AUG20", "SEPT20", "OCT20", "NOV20", "DEC20", "JAN21", "FEB21", "MAR21")
# Creating a variable for new cases
newcases <- c(jan20.newcases, feb20.newcases, mar20.newcases, apr20.newcases, may20.newcases, jun20.newcases, jul20.newcases, aug20.newcases, sep20.newcases, oct20.newcases, nov20.newcases, dec20.newcases, jan21.newcases, feb21.newcases, mar21.newcases)
# Creating a variable for deaths
deaths <- c(jan20.deaths, feb20.deaths, mar20.deaths, apr20.deaths, may20.deaths, jun20.deaths, jul20.deaths, aug20.deaths, sep20.deaths, oct20.deaths, nov20.deaths, dec20.deaths, jan21.deaths, feb21.deaths, mar21.deaths)
# Time series
dates <- c("01/01/2020", "02/01/2020", "03/01/2020","04/01/2020", "05/01/2020","06/01/2020", "07/01/2020", "08/01/2020", "09/01/2020", "10/01/2020","11/01/2020", "12/01/2020", "01/01/2021", "02/01/2021", "03/01/2021")
# COVID-19 health per month dataframe
monthly.covid.df <- data.frame(months, dates, newcases, deaths)
monthly.covid.df <- monthly.covid.df %>%
mutate(dates = as.Date(dates, format = "%m/%d/%y"))
# Result
# grid.draw(tableGrob(monthly.covid.df, theme=ttheme_default(base_size = 6) ))
head(monthly.covid.df)
## months dates newcases deaths
## 1 JAN20 2020-01-01 7 0
## 2 FEB20 2020-02-01 32 0
## 3 MAR20 2020-03-01 188138 3680
## 4 APR20 2020-04-01 875947 55007
## 5 MAY20 2020-05-01 725178 42053
## 6 JUN20 2020-06-01 847821 21634
# Find the top 5 highest months
head(arrange(monthly.covid.df,desc(newcases)), n = 5)
# Color labels
top5.cases <- c("#69b3a2","#69b3a2","#69b3a2","#69b3a2","#69b3a2","#69b3a2",
"#B20000","#69b3a2","#69b3a2","#B20000","#B20000","#B20000","#B20000","#B20000","#69b3a2")
# Case labels
num.cases.mils <- c("7", "32", "188K", "875K", "725K","847K", "1.9m", "1.4m", "1.2m", "1.9m", "4.4m", "6.3m", "6m", "2.3m", "1.7m")
# Visualize
b.cases <- barplot(newcases, yaxp=c(0, max(newcases), 15),
ylim=range(pretty(c(0, newcases))), names=months, xlab = "Time Period", ylab = "Number of New Recorded Cases", main = "Number of Newly Recorded COVID-19 Cases per Month in the U.S. 20-21", col = top5.cases)
y<-as.matrix(newcases)
text(b.cases, y+300000,labels=num.cases.mils)
# Find the top 5 highest months
head(arrange(monthly.covid.df,desc(deaths)), n = 5)
# Color labels
deaths.top5 <- c("#90bcff","#90bcff","#90bcff","#B20000","#B20000","#90bcff",
"#90bcff","#90bcff","#90bcff","#90bcff","#90bcff","#B20000","#B20000","#B20000","#90bcff")
# Case labels
num.deaths.mils <- c(0, 0, '3,680', '55K', '42K,', '21K', '28K', '29K', '22K', '24K', '39K', '79K', '97K', '63K', '33K')
# Visualize
b.deaths <- barplot(deaths, yaxp=c(0, max(deaths), 15),
ylim=range(pretty(c(0, deaths))), names=months, xlab = "Time Period", ylab = "Number of New Recorded Deaths", main = "Number of Newly Recorded COVID-19 Deaths per Month in the U.S. 20-21", col = deaths.top5)
y<-as.matrix(deaths)
text(b.deaths ,y+2000, labels=num.deaths.mils)
The table below summarizes the unemployment rates and monthly change in unemployment per month from Jan ’20 to Mar ’21. New cases and deaths per month have been calcuated as a proportion of the total cases and deaths from Jan ’20 to Mar ’21 in order to create a line chart with all three variables.
## Date Unemployment_Rate Monthly_Change percentcases percentdeaths
## 1 JAN20 3.5 -2.8 0.00002317851 0.0000000
## 2 FEB20 3.5 0.0 0.00010595892 0.0000000
## 3 MAR20 4.4 25.7 0.62296558878 0.6836349
## 4 APR20 14.8 236.4 2.90044987507 10.2186703
## 5 MAY20 13.3 -10.1 2.40122112355 7.8122010
## 6 JUN20 11.1 -16.5 2.80731860893 4.0189560
## 7 JUL20 10.2 -8.1 6.37214414227 5.2478270
## 8 AUG20 8.4 -17.6 4.86821293398 5.4215222
## 9 SEPT20 7.8 -7.1 4.03100172501 4.1476949
## 10 OCT20 6.9 -11.5 6.38421021402 4.4607179
## 11 NOV20 6.7 -2.9 14.57049022953 7.3187949
## 12 DEC20 6.7 0.0 21.17789628105 14.7109692
## 13 JAN21 6.3 -6.0 20.19442534241 18.0373733
## 14 FEB21 6.2 -1.6 7.79635782115 11.7835998
## 15 MAR21 6.0 -3.2 5.87317697682 6.1380385
Now, let’s take a look at correlations between unemployment rates, cases, and deaths:
In this line chart, dark red represents unemployment rate, dark blue represents percent contribution of new cases , and finally, dark green represents percent contribution of new deaths.
This graph helps us visualize the correlation between unemployment rates and the spread of COVID-19 in terms of newly recorded cases and deaths each month from Jan ‘20 to Mar ‘21. In the beginning of the timeline, we see stagnancy in unemployment during Jan ‘20 through Feb ‘20, as the pandemic had not yet reached the United States (although we had 39 cases, it had not reached major headlines yet and people were mostly unaware of the gravity of the virus, and we were at 0 deaths). Mar ‘20 is when COVID-19 hit the news headlines, cases shot up, and we experienced our first brutal month of deaths from this pandemic. The graph visualizes this categorial phenomenon, as Apr ‘20 had the highest recorded unemployment rate of 14.8% due to lockdowns and quarantine orders affecting businesses nationwide. We have studied which industries were hit the hardest further down in our analysis. Apr ‘20 through Jul ‘20 we experienced a fluctuating increase and decrease in the number of new cases and deaths, with unemployment numbers climbing and falling in sync. The timeline from Sept ‘20 to Jan ‘21 is especially surprising, as the number of cases and deaths hit their highest peaks yet unemployment rates increased slightly and stayed mostly continuous on a downward slope. This may be due to most local and state governments easing lockdown restrictions and as a result most businesses reopening with a better understanding of how to operate safely (outdoor dining, sanitization routines, plexiglass dividers, etc.). Overall, we can infer from this line chart that in the early stages of the pandemic there was a correlation between number of cases and deaths and unemployment rate, as cases/deaths informed decisions by government officials for safety procedures such as lockdowns; however, in the late stages of the pandemic, we learned how to live with the virus and our business adapted new ways to keep customers safe in order to reopen after a devastating year of closures which lowered unemployment rates although cases/deaths were still climbing. This might also infer that we opened our economy too soon, as Oct ‘20 to Jan ‘21 is when lockdown restrictions eased and cases/deaths climbed drastically, but that is a debate for another day.
Now, let’s take a look at correlations between monthly percent change in unemployment rates, cases, and deaths to confirm our graph from above:
In this line chart as well, dark red represents unemployment rate, dark blue represents percent contribution of new cases , and finally, dark green represents percent contribution of new deaths.
From the timeline developed by this line chart we can infer three notable changes: first, the initial spike in unemployment from Feb ‘20 to Mar ‘20; second, the peak of unemployment change from Mar ‘20 to Apr ‘20; and finally, third, the period of constant fluctuation in changes to unemployment levels from May ‘20 to Mar ‘21. Looking at the monthly changes in unemployment helps us confirm our earlier findings: unemployment rates are directly correlated with new cases and deaths in the early stage of the pandemic, but are unstable in later stages. In Apr ‘20 almost every local and state government in the U.S. entered lockdowns, creating the initial spike in unemployment rates and the large monthly change from Mar ‘20 to Apr ‘20. In the period of May ‘20 through Mar ‘21, individual local and state governments have had varying responses to the pandemic in terms of lockdowns, mask mandates, and reopening guidelines, which makes it difficult to map a constant pattern in unemployment rates during this time period (which we might consider the later stages of the pandemic). In order to look in more detail at the economic impact of COVID-19 on specific regions with specific pandemic responses, let’s now take a look at GDP by state and industry.
# Create a new unemployment dataframe
ur.viz <- ur.cleaned.df
# Renaming columns
colnames(ur.viz) <- c("Date", "Unemployment_Rate", "Monthly_Change")
# Cleaning and wrangling time/date data
ur.viz$Date <- months
ur.viz$Date <- factor(ur.viz$Date, levels = c("JAN20", "FEB20", "MAR20", "APR20", "MAY20", "JUN20", "JUL20", "AUG20", "SEPT20", "OCT20", "NOV20", "DEC20", "JAN21", "FEB21", "MAR21"))
# Add COVID-19 cases data, calculate contribution percentage from each month to total cases
options(scipen = 999)
vector = c()
for (i in 1:length(newcases)) {
vector <- c(vector, (newcases[i]/sum(newcases)*100))
formatC(vector[i], digits = 1)
}
ur.viz$percentcases <- vector
# Add COVID-19 deaths data, calculate contribution percentage from each month to total deaths
vector.d = c()
for (i in 1:length(deaths)) {
vector.d <- c(vector.d, (deaths[i]/sum(deaths)*100))
formatC(vector.d[i], digits = 1)
}
ur.viz$percentdeaths <- vector.d
# Result
head(ur.viz)
## Date Unemployment_Rate Monthly_Change percentcases percentdeaths
## 1 JAN20 3.5 -2.8 0.00002317851 0.0000000
## 2 FEB20 3.5 0.0 0.00010595892 0.0000000
## 3 MAR20 4.4 25.7 0.62296558878 0.6836349
## 4 APR20 14.8 236.4 2.90044987507 10.2186703
## 5 MAY20 13.3 -10.1 2.40122112355 7.8122010
## 6 JUN20 11.1 -16.5 2.80731860893 4.0189560
# Creating the graph
ggplot(ur.viz, aes(x=Date, group = 1)) + geom_line(aes(y = Unemployment_Rate, group = 1), color = "darkred") + geom_line(aes(y = percentcases, group = 1), color="orange", linetype="twodash") + geom_line(aes(y = percentdeaths, group = 1), color="darkgreen", linetype="twodash") + labs(x = "Time Period", y = "Percentage", caption="Source: Center for Disease Control (CDC) and U.S. Bureau of Economic Analysis (BEA)") + ggtitle("Unemployment Rate Trends in the U.S. 2020-21")
ggplot(ur.viz, aes(x=Date, group = 1)) + geom_line(aes(y = Monthly_Change, group = 1), color = "darkred") + geom_line(aes(y = percentcases, group = 1), color="orange", linetype="twodash") + geom_line(aes(y = percentdeaths, group = 1), color="darkgreen", linetype="twodash") + labs(x = "Time Period", y = "Change Percentage", caption="Source: Center for Disease Control (CDC) and U.S. Bureau of Economic Analysis (BEA)") + ggtitle("Monthly Change in Unemployment Rates in the U.S. 2020-21")
## State_Name_Col State_2020_Rank Year_Col GDP_Change_Col
## 1 California 15 Y2017 4.3
## 2 California 15 Y2018 3.1
## 3 California 15 Y2019 3.4
## 4 California 15 Y2020 -2.8
## 5 Texas 23 Y2017 2.8
## 6 Texas 23 Y2018 3.9
## 7 Texas 23 Y2019 2.9
## 8 Texas 23 Y2020 -3.5
## 9 RhodeIsland 38 Y2017 -0.6
## 10 RhodeIsland 38 Y2018 0.8
## 11 RhodeIsland 38 Y2019 1.0
## 12 RhodeIsland 38 Y2020 -4.5
## 13 Utah 1 Y2017 4.4
## 14 Utah 1 Y2018 5.6
## 15 Utah 1 Y2019 3.8
## 16 Utah 1 Y2020 -0.1
This is a set of four bar plots that display the percent change in GDP for California, Texas, Rhode Island, and Utah. Each graph includes data from one of the states mentioned above and displays the change in GDP from 2017 to 2020. Each bar in the chart has been filled with red or green to indicate whether there was an increase or decrease in GDP percentage.
This bar plot displays the change in GDP for the state of California. We can conclude that the state had a very high starting percentage and the value dropped slighlty in 2018 and 2019. However, the most drastic drop in GDP percentage happened in 2020 with a value of -2.3%.
This bar plot illustrates the GDP percent change in Texas. There is a starting percent change of roughly 2.5% and it significantly increased in 2018, but then dropped in 2019. Texas also had a significant decrease in 2020 with a value of -4.5%. This value is almost twice as low as California.
This graph displays the GDP change in Rhode Island from 2017 to 2020. This visualization is unique because there already was a GDP decline in 2017. Also, the GDP values for 2018 and 2019 were not as high as the other states. Lastly, Rhode Island had the highest drop in GDP for 2020 with a value of -4.5%.
This last graph shows the change in GDP for the state of Utah. This state had the highest GDP values for 2017, 2018, and 2019. It is also important to note that there was a very small decrease in 2020.
Explore the Percent Change in GDP by State in 2019
This is an interactive map of the United States where each state is colored in a green gradient based on the GDP percent change. We can conclude that the majority of states had a GDP value between 2 and 5 percent. It is interesting to note that there are no states with negative percentages because this was before the pandemic happened.
Explore the Percent Change in GDP by State in 2020
This is an interactive map that displays the GDP percentage change in the United States for 2020. We can see that the majority of states have values that are between -4 and 0. The only two states that fall within the lowest percent range are Rhode Island and Delaware. We can see that the states that previously had high percentages in 2019 had the lowest values in 2020. This could be because of a significant decrease in businesses across the country.
## # A tibble: 20 x 3
## Name Industry Percent_Change
## <chr> <chr> <dbl>
## 1 "Rhode Island " Real_Estate 0.42
## 2 "Rhode Island " Technical -0.33
## 3 "Rhode Island " Health -0.26
## 4 "Rhode Island " Arts -0.8
## 5 "Rhode Island " Food -0.4
## 6 "Texas" Real_Estate 0.17
## 7 "Texas" Technical -0.13
## 8 "Texas" Health -0.04
## 9 "Texas" Arts -0.25
## 10 "Texas" Food -0.25
## 11 "Utah " Real_Estate 0.580
## 12 "Utah " Technical 0.01
## 13 "Utah " Health 0.02
## 14 "Utah " Arts -0.09
## 15 "Utah " Food -0.34
## 16 "California " Real_Estate 0.14
## 17 "California " Technical 0.08
## 18 "California " Health -0.08
## 19 "California " Arts -0.31
## 20 "California " Food -0.570
This bar plot has been faceted to show the GDP percent change by each industry. The industries that had the lowest values were arts and food. The pandemic caused a lot of businesses to shut down and the arts industry was hit the hardest. It was interesting that California did not have a lower value because the majority of media/content is created in that state. The food industry was also significantly hit because many restaurants have closed or they have reduced their customer limits. It is interesting that Texas had a lower percent change because it is a largely population state. This could be attributed to the lower coronavirus restrictions resulting in less restaurants being closed. We can also observe that the real estate industry was doing significantly better than the other industries. Utah had the highest value so this means that a lot of people were buying properties in Utah when compared to the rest of the states. The technical industry was also interesting because California and Utah were the only states with an increase in value. It is surprising that the percentage change in GDP is very low because the technology hub of the country is located in California. During the pandemic, the major technology companies made impressive profits, so it is surprising as to why it is not reflected in this bar chart.
# Load in necessary libraries
library(leaflet)
library(maptools)
library(rgeos)
library(rgdal)
# This dataset shows the percent change in GDP for each state in the United States from 2017 to 2020.
gdp.state.df <- read.csv("/Users/dishasrivastava/Documents/GitHub/economic-impact-of-covid19/Datasets/GDP-State.csv")
state.df <- read.csv("/Users/dishasrivastava/Documents/GitHub/economic-impact-of-covid19/Datasets/GDP-State.csv")
# This dataset shows the percent change in GDP for each industry in each state in the United States from 2019 to 2020.
gdp.industry.df <- read.csv("/Users/dishasrivastava/Documents/GitHub/economic-impact-of-covid19/Datasets/GDP-Industry.csv")
# This is the shapefile for each state
states <- readOGR(dsn = "/Users/dishasrivastava/Documents/GitHub/economic-impact-of-covid19/Datasets", layer = "cb_2018_us_state_500k")
## Warning in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv =
## use_iconv, : ogrInfo: /Users/dishasrivastava/Documents/GitHub/economic-impact-
## of-covid19/Datasets/cb_2018_us_state_500k.dbf not found
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/dishasrivastava/Documents/GitHub/economic-impact-of-covid19/Datasets", layer: "cb_2018_us_state_500k"
## with 56 features
## It has 0 fields
# Renaming columns
colnames(gdp.industry.df) <- c("Name", "Agriculture", "Mining", "Utilities", "Construction", "Durable_Goods", "Nondurable_Goods", "Wholesale_Trade", "Retail_Trade", "Transportation", "Information", "Finance", "Real_Estate", "Technical", "Management", "Waste_Management", "Education", "Health", "Arts", "Food", "Other", "Government", "Unknown")
# Pivot columns so that data is in a long format
gdp.industry.df <- gdp.industry.df %>%
pivot_longer(cols = !Name, names_to = "Industry", values_to = "Percent_Change")
# Result
head(gdp.industry.df)
## # A tibble: 6 x 3
## Name Industry Percent_Change
## <chr> <chr> <dbl>
## 1 Connecticut Agriculture -4.1
## 2 Connecticut Mining 0
## 3 Connecticut Utilities 0
## 4 Connecticut Construction 0.04
## 5 Connecticut Durable_Goods -0.12
## 6 Connecticut Nondurable_Goods -0.11
# Renaming columns
colnames(gdp.state.df) <- c("State", "Y2017", "Y2018", "Y2019", "Y2020", "Rank-Y2020")
# Removing spaces from all rows and pivoting the data to be in long format
gdp.state.df <- gdp.state.df %>%
mutate(across(where(is.character), str_remove_all, pattern = fixed(" "))) %>%
pivot_longer(cols = Y2017:Y2020, names_to = "Year", values_to = "Change")
# California Subset
ca.gdp <- gdp.state.df[(gdp.state.df$State == "California"), ]
# Texas Subset
tx.gdp <- gdp.state.df[(gdp.state.df$State == "Texas"), ]
# Rhode Island Subset
ri.gdp <- gdp.state.df[(gdp.state.df$State == "RhodeIsland"), ]
# Utah Subset
ut.gdp <- gdp.state.df[(gdp.state.df$State == "Utah"), ]
CA <- ggplot(ca.gdp %>% mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
geom_bar(stat="identity")+
scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
ggtitle("Percent Change in GDP from 2017-2020 in California")+
geom_hline(yintercept = 0)
TX <- ggplot(tx.gdp %>% mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
geom_bar(stat="identity")+
scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
ggtitle("Percent Change in GDP from 2017-2020 in Texas")+
geom_hline(yintercept = 0)
RI <- ggplot(ri.gdp %>% mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
geom_bar(stat="identity")+
scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
ggtitle("Percent Change in GDP from 2017-2020 in Rhode Island")+
geom_hline(yintercept = 0)
UT <- ggplot(ut.gdp %>% mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
geom_bar(stat="identity")+
scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
ggtitle("Percent Change in GDP from 2017-2020 in Utah")+
geom_hline(yintercept = 0)
label_2020 <- paste0(
"<b>2020:</b> ", state.df$X2020, "<br>"
)
paletteBins <- c(-8, -6, -4, -2, 0)
colorPalette <- colorBin(palette = "RdYlGn", domain = state.df$X2020, na.color = "transparent", bins = paletteBins)
leaflet(states) %>%
addTiles() %>%
setView(lat = 39.8097, lng = -98.5556, zoom=4) %>%
addPolygons(
stroke = TRUE,
fillColor = ~colorPalette(state.df$X2020),
color = 'Black',
weight = 1.5,
label = ~lapply(label_2020, htmltools::HTML)) %>%
addLegend(pal = colorPalette, values = state.df$X2020, opacity = 0.9, title = "GDP Percent Change in 2020", position="bottomleft")
label_2019 <- paste0(
"<b>2019:</b> ", state.df$X2019, "<br>"
)
paletteBins <- c(-1, 0, 1, 2, 3, 4, 5, 6)
colorPalette <- colorBin(palette = "Greens", domain = state.df$X2019, na.color = "transparent", bins = paletteBins)
leaflet(states) %>%
addTiles() %>%
setView(lat = 39.8097, lng = -98.5556, zoom=4) %>%
addPolygons(
stroke = TRUE,
fillColor = ~colorPalette(state.df$X2019),
color = 'Black',
weight = 1.5,
label = ~lapply(label_2019, htmltools::HTML)) %>%
addLegend(pal = colorPalette, values = state.df$X2019, opacity = 0.9, title = "GDP Percent Change in 2019", position="bottomleft")
# Filtering data so that only CA, TX, RI, and UT are included
fltr_industry <- gdp.industry.df %>%
filter(Name == "California " | Name == "Texas"| Name == "Utah "| Name == "Rhode Island ") %>%
filter(Industry == "Health" | Industry == "Arts" | Industry == "Food" | Industry == "Technical" | Industry == "Real_Estate")
industry_plot <- ggplot(data = fltr_industry, aes(x = Name, y = Percent_Change))+
geom_bar(stat="identity", fill = "skyblue2")+
facet_wrap(~Industry, nrow=1)+
ggtitle("Percent Change in GDP across Industries")
industry_plot
GDP by State We were able to conclude that all states had a decline in their GDP during 2020. However, there were significant differences between the four states that we chose to examine. We found that Rhode Island had the largest GDP decline of -8% whereas Utah had a very minor difference of -1.7%. When we compared the heavily populated states, we found that California had a -2.3% decrease and Texas had a -4.5% decrease. It was surprising to see that there was such a low decrease in GDP for Utah in comparison to the other states, however, Utah was one of the only states in the U.S. which did not enforce a statewide lockdown. As we briefly studied earlier, in the resource published by the New York Times: “See Which States and Cities Have Told Residents to Stay at Home” only three counties in the state issued stay at home orders. Of those counties, Summit County is a popular tourist destination and Salt Lake City houses 1.2 million of Utah’s 3.2 million residents. We can assume that most businesses and services were able to remain open throughout the pandemic in Utah, with the exception of tourist locations and their largest city, which is why they had the smallest decrease in GDP. California issued a stay at home order on Mar 19th whereas Texas waited until Apr 2nd, which might have caused the difference of almost a double in GDP declines. In terms of industries impacted by COVID-19, we were thoroughly surprised that real estate had soared during the pandemic in all states. All states saw an increase in arts and food, with only Utah seeing an increase in health services. We also noticed that California experienced an increase in the tech sector, which is most likely brought forth by Silicon Valley; california also experienced an increase in real estate, which might be due to tech workers looking to work from home full time and hence, looking for more spacious homes.
GDP Nationally When we analyzed the GDP changes for the United States, we were able to infer that state which experienced high amounts of growth in 2019 experienced high amounts of GDP loss in 2020. This may be due to construction projects and/or business and services which were pioneered in 2019 or booming in 2019 coming to a sharp halt in 2020 with unknown plans for reopening.
Unemployment Rate Timeline Unemployment Rate has been directly impacted by the COVID-19 pandemic, however, by creating line charts to help us visualize the timeline of unemployment rates, new cases, and new deaths, we were able to see a stark difference the early half of 2020 and the mid/latter of 2020 and early parts of 2021. In the beginning of the pandemic, most local and state governments had strong responses which enforced lockdowns and closures of businesses that were at high risk for spreading the disease (salons, preschools, restaurants, and more). In this time period the unemployment rate and monthly change in the first month of the pandemic (Mar-Apr ‘20) skyrocketed. As cases and deaths rose, more businesses shut down and more individuals were laid off (Airbnb, Tripadvisor layoffs). There was a compounding shift, however, towards the beginning of the summer of 2020 where unemployment rates began to consistently decrease regardless of cases and deaths increasing or decreasing. This may be due to businesses reopening under new CDC safety guidelines such as outdoor dining, plexiglass separations, the rise in demand for jobs such as delivery persons and other essential services. As we learned more about how to keep ourselves safe, more individuals were able to return to work or find new work, which decreases unemployment rates. In Oct ‘20 to Jan ‘21, the unemployment rate continues to cruise downward whereas cases and deaths reach an all time high. This may be due to state and local governments reopening too soon, providing individuals with unemployment again but with a higher risk of being infected with the virus (as cases and death rates were climbing during this period.
Conclusion Overall, this was an extremely interesting data science project to work on. We learned a tremendous amount about how to organize data into usable formats, especially how to work with different types of data such as time data, categorical reports, and connecting visualizations to reasons behind the trend lines. We hope our findings can help inform how the U.S. economy was impacted by COVID-19 specifically in terms of unemployment and GDP at a state and national level.
United States COVID-19 Cases and Deaths by State over Time Dataset from the U.S. Center for Disease Control and Prevention.
Labor Force Statistics from the Current Population Survey Dataset from the U.S. Bureau of Labor Statistics.
Gross Domestic Product by State, 4th Quarter 2020 and Annual 2020 (Preliminary) from U.S. Bureau of Economic Analysis.
Interactive Choropleth Maps from R Journalism.
R and Leaflet to create interactive choropleth maps from Towardsdatascience.
Filter with Date data from Exploratory.io.
R Markdown: The Definitive Guide from Bookdown.
Lesson 2. Twitter Data in R Using Rtweet: Analyze and Download Twitter Data from EarthLab.
How to Generate Word Clouds in R from Towardsdatascience.
We have created a GitHub Repository with our entire codebase and data sets for future reference.
Please note: we have also saved PNG images of our graphs and added them to the ‘Graph-Images’ folder in our GitHub Repository!